A Review of Robot Learning for Manipulation: Challenges, Representations, and Algorithms

Sec 2. Basic concepts
1）physical systems、自由度、不完全驱动、非完整约束、模式切换、分段连续；

2）interactive perception

in order for: estimate property; predict effects of action;
can be used as：self-supervised learning
how to do：active learning（贯穿多处，可以用于转移模型与policy的学习）

3）Hierarchical Task Decompositions and Skill Reusability
自上而下层层分解任务【化繁为简】，技能重用

4）object-centric generalization

generalization via objects—both across different objects, and between similar (or identical) objects in different task instances

Note: 实际上较为困难，就是要在不同的物体上进行泛化。e.g. 在机器人层面考虑用柔性抓适应各种物体，或者在object层面抽象出general级别的rerpesentation。

Sec 3.形式化结构Formalizaiton
目的：总概整个task family:

A task family is a distribution, P(M), over MDPs, each of which is a task.

M_i = (S_i,A,R_i,T_i,\gamma,\tau)

note:skill: higher-level actions $\stackrel{modeled}{\longrightarrow}$ option: $o=(I_o,\beta_o,\pi_o)$

Sec 4. 定义与学习状态空间 define and learn state and context space
1）object representaion:
A.简介：within-task or across-task(context)
B.具体类型：pose、shape、material、interaction or relative property
C.HIERARCHIES：point;part;object level(底层->高层整体)
e.g.

pixel level（contact point、segmentation）；
a mug can be seen as having an opening for pouring, a bowl for containing, a handle for grasping, and a bottom for placing;
block stack (方块的堆叠) groups of objects;

2）method：passive and interacive perception
e.g. camera、human immitation V.S. interaction by sensor

3）steps：discover object;ensure freedom;estimate object property
Note：active learning approaches are often used to select informative actions for quickly determining the model parameters

Sec 5 .transition model
1）General form
A deterministic function $T:S×A \longrightarrow S$ or a stochastic distribution $T:S×A×S \longrightarrow R$

2）Types：continous； discrete； hybrid model

The discrete components of the state are often used to capture high-level task information while the continuous components capture low-level state information.

Key pt：continous model
My view：e.g. action 6dof（x,y,z,rx,ry,rz） -> continuous；state:object pose 同理

3）随机性（开门不一定开的成功）和不确定性（多点额外数据信息即可）

4）how to learn: Self-supervision and Exploration
sample: act then observe effect to get(s, a, s')

• Random sampling.
• Active sampling approaches can be used to select action samples that are the most informative .
• Intrinsic motivation: actively attempts to discover novel scenarios where its model currently performs poorly or that result in salient events.

ScottSu的个人学习小站

A Review of Robot Learning for Manipulation: Challenges, Representations, and Algorithms